Learning to Score System Summaries for Better Content Selection Evaluation
نویسندگان
چکیده
The evaluation of summaries is a challenging but crucial task of the summarization field. In this work, we propose to learn an automatic scoring metric based on the human judgements available as part of classical summarization datasets like TAC-2008 and TAC-2009. Any existing automatic scoring metrics can be included as features, the model learns the combination exhibiting the best correlation with human judgments. The reliability of the new metric is tested in a further manual evaluation where we ask humans to evaluate summaries covering the whole scoring spectrum of the metric. We release the trained metric as an open-source tool.
منابع مشابه
Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries
In this article, we propose a method of text summary's content and linguistic quality evaluation that is based on a machine learning approach. This method operates by combining multiple features to build predictive models that evaluate the content and the linguistic quality of new summaries (unseen) constructed from the same source documents as the summaries used in the training and the validat...
متن کاملDeep Learning for Automatic Summary Scoring
Automatic summary scoring is used very often by summarization system developers to test different algorithms and to tune their system. We have developed a new approach based on representation learning, using both unsupervised and supervised learning components, to score a summary based on examples of manually evaluated summaries. Our deep learning approach greatly surpassed ROUGE in terms of co...
متن کاملMerging Multiple Features to Evaluate the Content of Text Summary
In this paper, we propose a method that evaluates the content of a text summary using a machine learning approach. This method operates by combining multiple features to build models that predict the PYRAMID scores for new summaries. We have tested several single and ”Ensemble Learning” classifiers to build the best model. The evaluation of summarization system is made using the average of the ...
متن کاملUsing SUMMA for Language Independent Summarization at TAC 2011
The paper describes a language independent multi-document centroid-based summarization system. The system has been evaluated in the 2011 TAC Multilingual Summarization pilot task where summaries were automatically produced for document clusters in Arabic, English, French and Hindi. The system had a reasonable performance in content selection for languages such as Arabic and Hindi and medium per...
متن کاملTowards Multi-Document Summarization of Scientific Articles:Making Interesting Comparisons with SciSumm
We present a novel unsupervised approach to the problem of multi-document summarization of scientific articles, in which the document collection is a list of papers cited together within the same source article, otherwise known as a co-citation. At the heart of the approach is a topic based clustering of fragments extracted from each co-cited article and relevance ranking using a query generate...
متن کامل